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Abstract 

This report contains a tutorial introduction to the method of importance sampling. 
The use of this method is illustrated for simulations of the noise-induced energy 
jitter of return-to-zero pulses in optical communication systems. 
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When one rolls a die, the probability of it stopping with a particular number face- up is 1/6. 
To measure such a probability experimentally and accurately, one must roll the die many 
more than 6 times. With this fact in mind, consider the design of optical communication 
systems. A common way to evaluate system prototypes is to measure (in computer simu- 
lations or laboratory experiments) the distance over which they can transmit information 
with bit-error-ratios of order 10~ 9 (without forward error correction). Even with current 
computers, it is impractical to simulate pulse transmission many more than 10 9 times. 

One can use importance-sampling techniques to circumvent this difficulty Put simply, bit 
errors are caused by large system perturbations (deviations from ideal behavior) that occur 
infrequently. When one makes importance-sampled simulations of system performance one 
biases them in such a (controlled) way that these large perturbations occur more often 
than they should. Because they occur often, one can measure their statistics accurately. 
Subsequently, one adjusts the simulation results to remove their artificial bias. In this way 
one obtains results that are both unbiased and accurate. 

Pioneering importance-sampled simulations of phase noise in an optical system were made 
by Foschini and Vannuci jTj. Recently, importance-sampled simulations of polarization- mode 
dispersion were made by Biondini et al. 2 a . Importance-sampling methods were reviewed 
comprehensively by Smith et al. [3] and ways to combine data from different importance- 
sampled simulations were discussed by Veach In this report we make no attempt to 
duplicate their discussions. Rather, we show by example how to apply importance-sampling 
techniques to optical systems. 

It is instructive to consider the die example quantitatively. Each roll of an unbiased die 
is a trial, in which the probability of a successful outcome p = 1/6 and the value associated 
with a successful outcome (the weight with which a successful outcome is counted) w = 1. 
For each trial the expected value of the outcome is pw and the variance is w 2 p(l— p). Suppose 
that an unbiased die is rolled n\ times and let f\ be the measured 1-frequency (the number of 
Is divided by the number of rolls). Then the expected value of the 1-frequency E(fi) = 1/6. 
By adding ui terms of magnitude 5/36 and dividing the result by n\ one finds that the 
variance V(fi) = 5/36ni: To measure the 1-frequency with an accuracy of 1% (to conduct 
an experiment for which the standard deviation of the 1-frequency is 1% of the expected 
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value) one would have to roll the die 5 x 10 4 times. This would be a time-consuming task. 

If the 5- and 6-frequencies were of no interest one could bias the die by marking the 5- 
and 6-faces with Is. Since this change would increase the probability of rolling a 1 {jp — 
3/6 = 1/2), one should count each successful roll with reduced weight {w = 1/3). Suppose 
that this biased die is rolled n 2 times and let f 2 be the measured (weighted) 1-frequency. 
Then the expected value E(f 2 ) = 1/6, as required, and the variance V(f%) = l/36n 2 : 
By causing the desired event to occur more often than it would naturally, one is able to 
measure its probability more accurately, or with fewer rolls. (For rare events that occur in 
applications, the performance improvements are much larger than the factor of 5 associated 
with this example.) Of course, the price one pays for this increase in accuracy is the loss of 
information about the 5- and 6-frequencies. 

If these frequencies were of limited, but finite, interest, one could combine the measure- 
ments made with each die separately. If one were to weight the two measurements equally, 
by defining the combined 1-frequency / = (/i + /2V2, one would find that E(f) = 1/6 and 
V(f) = 5/144ni + l/144n 2 : The accuracy of the combined measurement is limited by the 
less-accurate individual measurement (the one with the larger variance coefficient or the one 
made with fewer rolls). One should weight the individual measurements in proportion to the 
numbers of rolls used to make them, and in inverse proportion to their variance coefficients. 
Let / = c\f\ + C2/2, where the condition c\ + c 2 = 1 ensures that E(f) = 1/6. Then a short 
calculation shows that the optimal values of C\ and c 2 are rii/(ni + 5n 2 ) and 5n 2 / (ni + 5n 2 ), 
respectively, in which case the minimal variance V(f) = 5/36(ni + 5n 2 ). For the case in 
which ni = n 2 = n/2, where n is the total number of rolls, V(f) = 5/108n. 

In this dice example simple formulas for the individual variances exist, which allow one to 
determine the optimal weight coefficients precisely. However, in applications such formulas 
might be complicated or unknown. Consequently, a different method is required. An effective 
method, which is called the balance heuristic [I] , is to weight each successful outcome (roll of 
either die) equally. In this method, if the unbiased die is rolled ri\ times and the biased die 
is rolled n 2 times, the combined probability of rolling a 1 is (rii + 3ra 2 )/6(n 1 + n 2 ). To ensure 
that the expected value of the combined measurement is 1/6, each successful roll (of either 
die) should be counted with weight w = {rii + n 2 )/{ni + 3n 2 ). By adding rii contributions 
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of magnitude w 2 pi(l — pi) and n 2 contributions of magnitude w 2 p2(l — P2), and dividing 
the result by {n\ + r^) 2 , one finds that the combined variance is (5ni + 9n2)/36(ni + 3n2) 2 . 
For the case in which n\ = n 2 = n/2 the combined variance is 7/144n, which is only 5% 
larger than the minimal variance of the preceding paragraph. Thus, one should weight each 
outcome in inverse proportion to its combined probability. 

Now consider the noise-induced energy (amplitude) jitter of a return-to-zero (RZ) pulse. 
Provided that the pulse energy e is greater than the noise energy in the surrounding bit slot, 
it evolves according to the stochastic ordinary differential equation (ODE) 



where g(z) is the amplifier gain rate, a is the fiber loss rate and r(z) is the rate at which 
the energy is changed by amplifier noise jSHH]- This random rate of change is quantified by 
the equations (r(z)) = and (r(z)r(z')) = {2n sp hvge)8{z — z'), where ( ) denotes an ensem- 
ble average, n sp is the spontaneous-emission factor (1.1-1.3) and hv is the photon energy. 
Equation (1) is valid for any isolated pulse and an arbitrary combination of distributed and 
lumped amplification. 

For definiteness, consider a 10 Gb/s system with uniformly-distributed amplification 
(g = a), in which a = 0.21 dB/Km, (3 = —0.30 ps 2 /Km (D = 0.38 ps/Km-nm) and 7 = 
1.7/Km-W. Then a soliton with a full-width at half-maximum of 30 ps has an energy of 21 fj 
(time-averaged power of 0.21 mW). If the system length I = 10 Mm the output noise power in 
both polarizations, in a frequency bandwidth of 12 GHz (wavelength bandwidth of 0.1 nm), is 
1.7 /iW: The (optical) signal-to-noise ratio is 21 dB. Systems with nonuniformly-distributed 
or lumped amplification produce the same noise power in shorter distances. 

For uniformly-distributed amplification Eq. ((TJ can be rewritten in the canonical form 



where x = e/eo is the energy, normalized to the equilibrium energy (in the absence of noise), 
and y is a Wiener process (Gaussian random variable) with (y) = and (y 2 ) = a 2 z, where 
the normalized source-strength a 2 = 2n sv hvg / €q. In the linear regime the multiplicative 
factor x 1 / 2 pa 1, from which it follows that x ~ 1 + y: The probability-density function 
(PDF) of the output energies is Gaussian, with mean 1 and variance a 2 z. (From a logical 



d z e = (g- a)e + r, 



(1) 



dx = x 1 ^ 2 dy, 



(2) 
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standpoint the PDF of the non-negative quantity x cannot be exactly Gaussian, because, if 

it were, the probability of x < would be finite for all z > 0. From a practical standpoint 

this inconsistency is tolerable if the probability of x < is exponentially small for system 

lengths of interest.) For the aforementioned system the output variance a 2 l = 6.6 x 1CT 3 

(which is of order 10 -2 ) and the output deviation is 8.1 x 10~ 2 (which is of order 10 _1 ). In 

the nonlinear regime the factor x 1 ! 2 modifies the tails of the PDF significantly. For reference, 

the analytical solution of Eq. (J2J) has the PDF 

cosh(mx 1 / 2 /f) exp[— (m 2 + x)/2v] 
P[X) ~ (2nxvy/ 2 ' (3) 

where m — 1 — cr 2 z/8 and v = <J 2 z/4 jjj. 

Equation (2) and solution (3) model energy jitter in a (continuous) system with uniformly- 
distributed amplification. We simulated a (discrete) system with n« = 100 lumped amplifiers. 
Between the amplifiers the energy x did not change. At the ith amplifier the energy was 
changed (kicked) by the random amount dyi, where the properties (Syi) = and (5yi6yj) = 
10~ A 5ij ensured that ((J2i=i ^Vi) 2 ) = 10~ 2 : The discrete system had the same characteristics 
as the continuous system. The output energies were assigned to energy bins of (common) 
width 0.02 and each bin probability pj was defined to be the number of pulses whose energies 
fell within the bin boundaries (bin count) divided by the total number of pulses. (Since 
probabilities cannot be measured by finite numbers of trials, these quantities should be 
called the relative frequencies associated with the bins. We use the term probabilities as an 
abbreviation for the correct term.) To facilitate comparisons to the analytical PDF (|3*|). the 
simulation probabilities were defined to be the bin probabilities divided by the bin width. 
In these (direct) simulations the occurrence of each output energy was counted with unit 
weight. 

The PDF associated with an ensemble of 10 6 pulses is displayed in Fig. 1. Although 
the simulation results agree well with Eq. (JHJ) near the peak of the PDF, they do not even 
begin to sample the tail of the PDF. On a 1-GHz PC these simulations, which are based 
on Eq. (2), take a few minutes. Simulating the transmission of many more than 10 9 pulses 
would take many days. Realistic simulations, which are based on the nonlinear Schroedinger 
equation, would take even longer. 

To probe the tails of the PDF one must make large energy perturbations occur more 
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often than they would naturally. One way to achieve this goal is to increase the standard 
deviation of the energy kicks. Let go denote the (common) unbiased kick distribution, with 
deviation cxq = 1CT 2 , and q denote the (common) biased kick distribution, with deviation 
a > (T - Then, at the ith amplifier the probability that the energy is kicked by the amount 8%/i 
is increased by the factor q(5yi) / qo(Syi) . Since the kicks at all the amplifiers are biased, the 
output energy occurs with a probability that is larger than its natural probability by the total 
factor f t = Itfi^Syi) / qo(5yi) , which depends on the full kick sequence (5yi,5y2, ■ ■ ■ ,Sy ni ). 
One can remove this bias by counting the output energy with reduced weight: One increments 
the appropriate bin probability by 1/ ft, rather than 1. All other aspects of data counting 
remain the same. For reference, the probability factor 1/ f t is called the likelihood ratio. 

The PDF associated with an ensemble of 10 6 pulses is displayed in Fig. 2 for the case 
in which a = 1.2<7 . The results of these (importance-sampled) simulations differ from 
the previous results in two ways. First, they do probe the tail of the PDF. Although the 
simulations reproduce the shape of the analytical PDF, the simulation probabilities are not 
accurate because the number of data points that sample the tail of the PDF is still small. 
Second, by causing large kicks to happen more often, one causes small kicks to happen 
less often. Consequently, the body (peak) of the PDF is not reproduced accurately. This 
deficiency prevents one from increasing a until the tail of the PDF is sampled accurately. 

Another way to achieve the stated goal is to change the (common) mean of the kick dis- 
tributions. If the mean kick \i is positive (negative) the energy drifts toward larger (smaller) 
values. The distributions associated with 3 ensembles of 3 x 10 5 pulses are displayed in Fig. 3 
for cases in which a = cr , and \i = — 3 x 10~ 3 , and 3 x 10" 3 . These values produce data sets 
with mean energies of —0.3, 0.0 and 0.3, respectively. Although the simulation distributions 
are inaccurate for energies that are far from their mean energies (because the bin counts are 
low), they are accurate near their mean energies. It only remains to combine the individual 
distributions to produce a composite distribution that is accurate for the entire domain of 
interest. 

One way to combine the individual distributions is to weight their bin probabilities equally 
(which is equivalent to combining the data sets before sorting the output energies and the 
associated probability factors into bins). Let pjk be the jth bin probability associated with 
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the kih data set (which was produced by kick distributions with mean For each j, if the 
individual bin probabilities pjk were all zero the combined bin probability pj was defined to 
be zero and if some of the pjk were nonzero pj was defined to be their average. The results 
of this procedure are shown in Fig. 4. Although the composite distribution does cover the 
domain of interest, it is inaccurate near the boundaries of the sample spaces. Combining the 
individual distributions with equal weight allows the bodies of the distributions (which have 
high bin counts) to be polluted by the tails of neighboring distributions (which have low bin 
counts). 

The dice example suggests that it is better to weight the bin probabilities according to the 
associated bin counts. For each j, if the individual bin counts bj k were all zero the combined 
bin probability pj was defined to be zero and if some of the bin counts were nonzero pj was 
defined to be YZ=i bjkPjk/ J2k=i fyfc- The results of this procedure are shown in Fig. 5. The 
composite distribution covers, and is accurate throughout, the domain of interest. This ad- 
hoc procedure allows one to combine bin probabilities generated at different times, without 
recourse to the data sets on which they were based. 

Although the ad-hoc method works, it is not the balance heuristic. Suppose that the 
kick sequence (Syi, Sy 2 , ■ ■ ■ , Sy ni ) occurs during the first simulation, which is made using 
the biased distribution q\ (with mean \x\). Then the associated probability factor j t \ = 
n"i 1 5i(5yj) / qo(5yi) . Were the same sequence to occur during the kth simulation, the associ- 
ated probability factor would be f t k, which depends on the biased distribution q k . If the simu- 
lations involve the same number of pulses, the combined probability factor f t = Ylk=i ftk/ n k- 
(It is easy to generalize this formula.) When we made the individual simulations and sorted 
the output energies and the individual probability factors into bins, we also sorted the com- 
bined probability factors into a separate set of bins, which was common to all the simulations. 
The results of this procedure are shown in Fig. 6. The balance-heuristic method works well. 

In summary, we showed by example how to apply importance sampling techniques to 
simulations of optical communication systems. These techniques are easy to apply and 
increase significantly the accuracy with which rare events can be simulated. 

We acknowledge useful discussions with D. Chizhik, G. Foschini, R. Moore and J. Salz. 
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Postscript: Independent simulations of energy jitter were made recently by Moore et al. 
[Opt. Lett. 28, 105 (2003)], who showed that the predictions of the energy equation (JTJ) are 
consistent with the results of simulations based on the nonlinear Schroedinger equation. 
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Figure 1: Probability distribution function of the (normalized) output energies obtained by 
solving Eq. (J2J) analytically (curve) and numerically, for an ensemble of 10 6 pulses (dots). 
The standard deviation of the energy kicks was 10~ 2 . 
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Figure 2: Probability distribution function of the (normalized) output energies obtained by 
solving Eq. (J2J) analytically (curve) and numerically, for an ensemble of 10 6 pulses (dots). 
The standard deviation of the energy kicks was 1.2 x 10~ 2 . 
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Figure 3: Probability distribution functions of the (normalized) output energies obtained by 
solving Eq. (J2J) analytically (curve) and numerically, for 3 ensembles of 3 x 10 5 pulses (dots). 
For each ensemble the standard deviation of the energy kicks was 10 -2 . The mean energy 
kicks were —3 x 1CT 3 , and 3 x 1CT 3 . 
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Figure 4: Probability distribution function of the (normalized) output energies obtained by 
solving Eq. (j2J analytically (curve) and numerically, for 3 ensembles of 3 x 10 5 pulses (dots). 
For each ensemble the standard deviation of the energy kicks was 10~ 2 . The mean energy 
kicks were —3 x 10~ 3 , and 3 x 10~ 3 . The 3 sets of bin probabilities were combined without 
weighting. 
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Figure 5: Probability distribution function of the (normalized) output energies obtained by 
solving Eq. (j2J analytically (curve) and numerically, for 3 ensembles of 3 x 10 5 pulses (dots). 
For each ensemble the standard deviation of the energy kicks was 10~ 2 . The mean energy 
kicks were —3 x 10~ 3 , and 3 x 10~ 3 . When the 3 sets of bin probabilities were combined, 
they were weighted according to the associated bin counts. 
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Figure 6: Probability distribution function of the (normalized) output energies obtained by 
solving Eq. (j2J analytically (curve) and numerically, for 3 ensembles of 3 x 10 5 pulses (dots). 
For each ensemble the standard deviation of the energy kicks was 10~ 2 . The mean energy 
kicks were —3 x 10~ 3 , and 3 x 10~ 3 . When the 3 data sets were combined, the data were 
weighted according to the combined probability factors. 
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